Web Crawler Implementation
Common Libraries
Python >= 3.10
- requests
- bs4 (BeautifulSoup)
- selenium
- pandas
- jupyter
Static Web Page
Libraries
- Requests
- BeautifulSoup
- Selenium
- Pandas
Implement
-
PTT
-
Pandas read_html
https://github.com/uuboyscy/course-PyETL/blob/master/part06_usefulPackages/02_pandas_read_html.ipynb
-
POST
Sample URL:
Dynamic Web Page
Libraries
- Requests
- JSON
Implement
-
Nownews (GET)
https://github.com/uuboyscy/course-PyETL/blob/master/part05_dynamicWebPage/06_nownews.py
-
Newmobilelife (POST)
https://github.com/uuboyscy/course-PyETL/blob/master/part05_dynamicWebPage/05_newmobilelife.py
Selenium
Libraries
- Selenium
Driver environment
-
Chrome driver
-
Steps:
-
Initiate driver
service = Service("./chromedriver")
driver = Chrome(service=service) -
driver.get(
url
) -
driver.find_element(
by
,value
) -
driver.execute_script(
javascript
) -
driver.close()
-